The large scale structure of metabolic networks: a glimpse at life's origin?
نویسنده
چکیده
The number of interactions per node in metabolic network graphs follows a power law. I discuss two possible evolutionary explanations of this feature. The first relates to the observation that graphs with this degree distribution are robust to random node removal. This observation has lead to the hypothesis that metabolic networks show power-law degree distributions because it endows them with robustness against perturbations. However, abiotic chemical reaction networks also show this degree distribution, which makes robustness a less likely candidate cause of this distribution in metabolism. Secondly, this power-law distribution may be the result of the growth of metabolism through addition of metabolites over billions of years. A variety of graphgrowth models that lead to similarly structured graphs predicts that highly connected network nodes are old nodes. Empirical evidence supports this prediction for metabolic networks. Metabolism comprises the network of interactions that provide energy and building blocks for cells and organisms, a network sustaining the living and allowing it to grow and reproduce. For well-studied organisms, especially microbes such as Escherichia coli, considerable information about metabolic reactions has been accumulated through decades of experimental work. This information, originally scattered through thousands of papers of original literature, has increasingly found its way into larger collections including encyclopedias (Neidhardt 1996) and on-line databases (Karp et al. 1999, Ogata, 1999 #473). With easier availability of this information, it has become feasible to map the structure of large pieces of an organism's metabolic network. Representing metabolic networks. David Fell and myself (Fell and Wagner 2000; Wagner and Fell 2001) assembled a list of 317 stoichiometric equations involving 287 substrates that represent the central routes of energy metabolism and small-molecule building block synthesis in E. coli. Because there is considerable variation in the metabolic reactions realized under different environmental conditions, we included only reactions that would occur under one condition: aerobic growth on minimal medium with glucose as sole carbon source and O2 as electron acceptor. We also deliberately omitted (i) reactions whose occurrence is reportedly strain-dependent (Neidhardt 1996), (ii) biosyntheses of complex cofactors (e.g., adenosyl-cobalamine) which are not fully understood, and (iii) syntheses of most polymers (RNA, DNA, protein) because of their complex stoichiometry. When faced with a complex assemblage of chemical reactions, the problem arises immediately of how to represent the resulting reaction network. Importantly, for most reactions only qualitative information is available one may know the substrates and stoichiometry of a reaction but not much more. A mathematical representation that captures such qualitative information is that of a graph, for example that of a substrate graph GS=(VS, ES). Its vertex set VS consists of all chemical compounds (substrates) that occur in the network. Two substrates S1, S2 are adjacent if there exists an edge e, i.e., e=(S1, S2) ∈ ES, the edge set of this graph, if the two substrates occur (either as substrates or products) in the same chemical reaction. Such a network representation has the advantage of being intuitive and simple. Other graph-like representations of metabolic networks are possible, including bipartite graphs and hypergraphs (Graham et al. 1995). However, hypergraphs are much less intuitive constructs than graphs, and the many analysis tools available for graphs have not yet been developed to the same extent for other graph representations. One might argue that the existence of irreversible chemical reactions would suggest a directed graph (Graham et al. 1995) representation, i.e., a graph where each edge has a direction. However, a directed graph representation would be inappropriate for an important application of graph representations of biological networks: to assess qualitatively how perturbations of either enzyme concentrations via mutation or substrate concentrations via changes in consumption or availability propagate along the network. The reason is that even for irreversible reactions, the concentration of a reaction product potentially affects the reaction rate by occupancy of the enzyme's active site. Reaction products can thus affect substrate concentrations “upstream” of irreversible reactions. What is the structure of the E. coli substrate graph? It has much in common with graphs found in areas as different as computer science (the world-wide web) and sociology (friendship and collaboration networks). It is a small world network (Watts 1999) meaning that any two nodes (substrates) can be reached from each other through a path of very few edges, fewer than in other graphs of comparable size. Also, the distribution of the vertex degree d, the number of edges d connecting each substrate to other substrates follows a power law, i.e., the probability P(d) of finding a vertex with degree d is P(d) ∝ d (Fig. 1). (The exponent is less than two but can not be estimated very accurately due to small network size.) This degree distribution has also been found for reactions derived from different organisms (Jeong 2000). It seems to be a universal characteristic of metabolic networks. Power-laws and robustness. Two complementary hypotheses figure prominently in explaining power-law degree distributions. First, Albert and collaborators. (Albert et al. 2000) found that networks with power-law distributed degrees are robust to random perturbations in the following sense. Upon removal of randomly chosen nodes, the mean distance between network nodes that can still be reached from each other (via a path of edges) increases only very little. This distance is also known as the network diameter. In graphs with other degree distributions, network diameter can increase substantially upon node removal. Also, graphs with power-law degree distributions fragment less easily into large disconnected subnetworks upon random node removal. These findings have led Jeong and collaborators (Jeong 2000) to suggest that metabolic network graphs with power-law distributed degrees have such a degree distribution because this distribution provides robustness against perturbations. It is difficult to assess the merit of this hypothesis for metabolic networks directly, for doing so would require comparing large metabolic networks of different structure. However, the ensemble of core metabolic reactions is very similar in most free-living organisms, and thus the global structure of metabolism is highly conserved. In addition, it is not easy to identify (i) the kinds of perturbations to which a metabolic networks would have adapted over billions of years, and (ii) the reasons why short path lengths would provide an advantage to the organism. At most, one can venture an informed speculation. For metabolic networks, a possible advantage of small mean path lengths stems from the importance of minimizing transition times between metabolic states in response to environmental changes (Easterby 1986; Schuster and Heinrich 1987; Cascante et al. 1995). Networks with robustly small average path lengths thus might adjust more rapidly to environmental change. In contrast to this weak case for this selectionist explanation of the degree distribution, there may be a stronger case against it. One might ask whether power-law degree distributions might not be features of many or all large chemical reaction networks, whether part of an organism or not, and regardless of whether they perform any function that benefits from a robust network diameter. If so, then metabolic network degree distributions would join the club of other power-laws (such as Zipf's law of word frequency distribution in natural languages) whose existence does not owe credit to a benefit they provide. Gleiss et al. (Gleiss et al. 2001) have assembled public information on a class of large chemical reaction networks that exist not only outside the living, but on spatial scales many orders of magnitude larger than organisms. These are the chemical reaction networks of planetary atmospheres, networks largely shaped by the photochemistry of their component substrates. The available data stems not only from earth's atmosphere, but also from other solar planets including Venus and Jupiter, planets with chemically vastly different atmospheres. These planets atmospheres have been explored through remote spectroscopic sensing methods and through visits by planetary probes. The chemical reaction networks in these atmospheres, despite being vastly different in chemistry, have a degree distribution consistent with a power law (Gleiss et al. 2001). This suggests that power-law distributions may be very general features of chemical reaction networks. The reasons why we observe them in cellular reaction networks may have nothing to do with the robustness they may provide. Power-laws and deep time. Metabolic networks have a history. They have not been assembled in their present state at once. They have grown, perhaps over billion years, as organisms increased their metabolic and biosynthetic abilities. Having to take into account this history raises a question: How does a network arrive at a power-law degree distribution if it grows? The perhaps simplest mathematical model capable of growing power-law distributed networks involves only two simple rules (Barabasi et al. 1999). First and unsurprisingly, it adds nodes to a graph. Second, it connects this node to previously existing nodes according to a second rule, where already highly connected nodes are more likely to receive a new connection than nodes of lesser degree connectivity. Over many node additions, a power law degree distribution emerges. A great variety of variations to this model have been proposed (reviewed in Albert and Barabasi 2001). They differ greatly in detail but retain in some way or another the rule that new connections preferably involve highly connected nodes. But more importantly, many of these models make a key prediction: Highly connected nodes are old nodes, nodes having been added very early in a network's history. We may never know enough about the history of life and metabolism to distinguish between different ways in which metabolism might have grown. However, we can address this latter prediction, common to many different growth models. Are highly connected metabolites old metabolites? The answer will contain a speculative element, because the oldest metabolites are those that arose in the earliest days of the living, close to life's origins. Also, that life forms as different as bacteria and humans have very similar metabolic structure suggests that the growth of metabolism has essentially been completed at the time the common ancestor of extant life emerged. The detailed structure of metabolism at this early time may remain in the dark forever. However, origin of life hypotheses make some clear predictions on the chemical compounds expected to have been part of early organisms. There are several of these hypotheses, and they are complementary in the respect most important here: They emphasize the origins of different aspects of life's chemistry. Some emphasize the origins of early genetic material (RNA). Others make postulates about the composition of the earliest proteins. Yet others ask about the earliest metabolites in energy metabolism. Each of them makes a statement about a different aspect of early life's chemistry. Figure 2 shows the twelve most highly connected metabolites of the E. coli metabolic network graphs. Every single one of them has been part of early organisms according to at least one origin-of-life hypothesis. Colored in green are compounds such as coenzyme A thought to have been a part of early RNA-based organisms (Benner et al. 1989). The RNA moieties they contain are present in all organismal lineages. Some compounds in this group, such as tetrahydrofolate and coenzyme A, are thought to have played a role in precellular life that may have taken place on polykationic surfaces. Their merit in this regard is that they are elongate molecules with one anionic terminus. They are therefore able to flexibly tether other molecules to the substrate, thus localizing them while simultaneously increasing their potential to react with other compounds (Wachtershauser 1988). Colored in red in Figure 2 are amino acids that were most likely part of early proteins. This postulate is based on likely scenarios for the early evolution of the genetic code (Kuhn and Waser 1994). Shown in blue are compounds likely to be a part of the earliest energy and biosynthetic metabolism. Glycolysis and the TCA cycle are perhaps the most ancient metabolic pathways, and various of their intermediates (αketoglutarate, succinate, pyruvate, 3-phosphoglycerate) occur in Figure 2. (Benner et al. 1989; Taylor and Coates 1989; Morowitz 1992; Kuhn and Waser 1994; Waddell and Bruce 1995; Lahav 1999). The potential relation between evolutionary history and degree connectivity of metabolites corroborates a postulate put forth and defended forcefully by Morowitz (1992), namely that intermediary metabolism recapitulates the evolution of biochemistry. Thus, although the structure of metabolic networks may not be a reflection of their robustness, it may teach us about their history. Functional genomic experiments are unearthing the structure of many other genetic networks (Hughes et al. 2000; Uetz et al. 2000; Ito et al. 2001), some of which show a power-law degree distribution (Jeong et al. 2001, Wagner, 2001 #1360). Perhaps their structure can also teach us important lessons about their ancient history. Acknowledgments. I gratefully acknowledge financial support through NIH grant
منابع مشابه
The Large-scale Structure of Metabolic Networks: A Glimpse at Life’s Origin? Mapping an Organism’s Network Structure
M etabolism comprises the network of interactions that provide energy and building blocks for cells and organisms, a network sustaining the living and allowing it to grow and reproduce. For well-studied organisms, especially microbes such as Escherichia coli, considerable information about metabolic reactions has been accumulated through decades of experimental work. This information, originall...
متن کاملیک نگرش ترکیب سطوح برای تخمین ماتریس مبدأ و مقصد در شبکههای بزرگ مقیاس
Transportation problems are usually considered in large-scale networks, where finding the optimal solution of these problems is so time-consuming and costly. Therefore, a useful method to solve the large-scale network problems is dividing them into some smaller sub-problems. In this paper, for the first time, the origin-destination (o-d) matrix estimation problem is considered through a mixed p...
متن کاملCommunity Detection using a New Node Scoring and Synchronous Label Updating of Boundary Nodes in Social Networks
Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as in...
متن کاملPredictive Control of Gas Injection in Natural Gas Transport Networks
The present sought to draw a comparison between Model Predictive Control performance and two other controllers named Simple PI and Selective PI in controlling large-scale natural gas transport networks. A nonlinear dynamic model of representative gas pipeline was derived from pipeline governing rules and simulated in SIMULINK® environment of MATLAB®. Control schemes were designed to provide a s...
متن کاملInvestigation on metabolism of cisplatin resistant ovarian cancer using a genome scale metabolic model and microarray data
Objective(s): Many cancer cells show significant resistance to drugs that kill drug sensitive cancer cells and non-tumor cells and such resistance might be a consequence of the difference in metabolism. Therefore, studying the metabolism of drug resistant cancer cells and comparison with drug sensitive and normal cell lines is the objective of this research. Material and Methods:Metabolism of c...
متن کاملVULNERABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS: GRAPH THEORY METHOD
The main functional purpose of a water distribution network is to transport water from a source to several domestic and industrial units while at the same time satisfying various requirements on hydraulic response. All the water distribution networks perform two basic operations: firstly the water network needs to deliver adequate amounts of water to meet specific requirements, and secondly the...
متن کامل